Goto

Collaborating Authors

 technical novelty


A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits

Neural Information Processing Systems

We present a unified likelihood ratio-based confidence sequence (CS) for *any* (self-concordant) generalized linear model (GLM) that is guaranteed to be convex and numerically tight. We show that this is on par or improves upon known CSs for various GLMs, including Gaussian, Bernoulli, and Poisson. In particular, for the first time, our CS for Bernoulli has a $\mathrm{poly}(S)$-free radius where $S$ is the norm of the unknown parameter. Our first technical novelty is its derivation, which utilizes a time-uniform PAC-Bayesian bound with a uniform prior/posterior, despite the latter being a rather unpopular choice for deriving CSs. As a direct application of our new CS, we propose a simple and natural optimistic algorithm called **OFUGLB**, applicable to *any* generalized linear bandits (**GLB**; Filippi et al. (2010)). Our analysis shows that the celebrated optimistic approach simultaneously attains state-of-the-art regrets for various self-concordant (not necessarily bounded) **GLB**s, and even $\mathrm{poly}(S)$-free for bounded **GLB**s, including logistic bandits. The regret analysis, our second technical novelty, follows from combining our new CS with a new proof technique that completely avoids the previously widely used self-concordant control lemma (Faury et al., 2020, Lemma 9). Numerically, **OFUGLB** outperforms or is at par with prior algorithms for logistic bandits.


emphasize the technical novelty of our upper bound and lower bound as Reviewer # 1, Reviewer # 3 and Reviewer # 4 2 commented on the technical novelty of our theoretical results

Neural Information Processing Systems

We thank all the reviewers for their valuable feedback and appreciating our contributions. T echnical novelty of the upper bound. In the exploration phase, Jin et al. [2020] set reward to be To our knowledge, this idea is new in the literature. For example, for the hard instance in [Du et al. 2020], only a single state-action pair has non-zero reward Moreover, we focus on the reward-free setting while Du et al. [2020] focused on the standard RL setting. Below we address specific concerns from each reviewer.


Review for NeurIPS paper: A Stochastic Path Integral Differential EstimatoR Expectation Maximization Algorithm

Neural Information Processing Systems

Summary and Contributions: This paper proposes SPIDER-EM algorithm, which is the combination of recently developed SPIDER estimator with Expectation Maximization (EM) algorithm. The paper also provides a unified framework of stochastic approximation (SA) within EM. The results of SPIDER-EM match the typical results of SPIDER in nonconvex optimization, i.e., O(\sqrt(n)) and improves the previous result on EM with SVRG O(n {2/3}). Since it matches the typical results of SPIDER in nonconvex optimization, the obtained results should be correct. It is interesting that the SPIDER estimator can be applied to EM algorithms. On the other hand, since other variance reduction techniques (such as SVRG) have already been applied in the EM setting in the literature, the idea of SPIDER-EM is a combination of recent popular variance reduction algorithm SPIDER and the previous variance reduction EM algorithms, which is incremental.


A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits

Neural Information Processing Systems

We present a unified likelihood ratio-based confidence sequence (CS) for *any* (self-concordant) generalized linear model (GLM) that is guaranteed to be convex and numerically tight. We show that this is on par or improves upon known CSs for various GLMs, including Gaussian, Bernoulli, and Poisson. In particular, for the first time, our CS for Bernoulli has a \mathrm{poly}(S) -free radius where S is the norm of the unknown parameter. Our first technical novelty is its derivation, which utilizes a time-uniform PAC-Bayesian bound with a uniform prior/posterior, despite the latter being a rather unpopular choice for deriving CSs. As a direct application of our new CS, we propose a simple and natural optimistic algorithm called **OFUGLB**, applicable to *any* generalized linear bandits (**GLB**; Filippi et al. (2010)).


Review for NeurIPS paper: Gradient-EM Bayesian Meta-Learning

Neural Information Processing Systems

Additional Feedback: I think the technical novelty of the proposed algorithm is somewhat limited since the resulting algorithm is essentially a Bayesian version of reptile (GEM-BML using L [1] loss). Nevertheless, I like the reinterpretation given in the paper, especially the co-ordinate decent view of meta-update decoupling the inner-level update and outer-level update. Any argument highlighting the technical novelty of the proposed method would be appreciated. The important aspect of the proposed method is its robustness due to being Bayesian. I think the paper could be strengthened by including more experiments to see this aspect by testing performance or calibration under distributional shift. The performance of GEM-BML and GEM-BML is not that impressive for typical few-shot learning settings (the difference with baselines are not significant in a statistical sense for some settings).


Review for NeurIPS paper: Neural Execution Engines: Learning to Execute Subroutines

Neural Information Processing Systems

Weaknesses: In general, I think the technical novelty of this work is limited. In particular, they claim that an additional mask prediction component is necessary to achieve generalization. My understanding is that the training supervision of NEE includes the desired mask at each execution step, which corresponds to the data pointers. However, it is unclear whether the training supervision of the baseline Transformer also includes the ground truth masks, or it only includes the output value at each step. Basically, I want to know whether the improvement comes from the more fine-grained supervision or the architectural design.


Review for NeurIPS paper: An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch

Neural Information Processing Systems

Weaknesses: -The cost function formulation is similar to previous work of [17] and [43]. The adversarial objective minimized is based on prior work of [17] and [43]. Given this, the proposed approach does not offer significant technical novelty. However, the experiments are based on sim-to-sim evaluation where there are two simulator for a task and one of them is called'real'. I do not see such characterization as acceptable.